Certified Defenses against Adversarial Examples

نویسندگان

Aditi Raghunathan

Jacob Steinhardt

Percy Liang

چکیده

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most = 0.1 can cause more than 35% test error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples

MagNet and “Efficient Defenses...” were recently proposed as a defense to adversarial examples. We find that we can construct adversarial examples that defeat these defenses with only a slight increase in distortion.

متن کامل

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

Ongoing research has proposed several methods to defend neural networks against adversarial examples, many of which researchers have shown to be ineffective. We ask whether a strong defense can be created by combining multiple (possibly weak) defenses. To answer this question, we study three defenses that follow this approach. Two of these are recently proposed defenses that intentionally combi...

متن کامل

Adversarial Example Defense: Ensembles of Weak Defenses are not Strong

متن کامل

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

We identify obfuscated gradients as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat optimization-based attacks, we find defenses relying on this effect can be circumvented. For each of the three types of obfuscated gradients we discover, we describe indicators of defenses exhibiting th...

متن کامل

Adversarial Examples: Attacks and Defenses for Deep Learning

With rapid progress and great successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.09344 شماره

صفحات -

تاریخ انتشار 2018

Certified Defenses against Adversarial Examples

نویسندگان

چکیده

منابع مشابه

MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

Adversarial Example Defense: Ensembles of Weak Defenses are not Strong

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Adversarial Examples: Attacks and Defenses for Deep Learning

عنوان ژورنال:

اشتراک گذاری